Goto

Collaborating Authors

 collective multiagent rl


Credit Assignment For Collective Multiagent RL With Global Rewards

Neural Information Processing Systems

Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment. We focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their ``collective influence'' on each other, rather than their identities. Unlike previous work, we address a general setting where system reward is not decomposable among agents. We develop collective actor-critic RL approaches for this setting, and address the problem of multiagent credit assignment, and computing low variance policy gradient estimates that result in faster convergence to high quality solutions. We also develop difference rewards based credit assignment methods for the collective setting. Empirically our new approaches provide significantly better solutions than previous methods in the presence of global rewards on two real world problems modeling taxi fleet optimization and multiagent patrolling, and a synthetic grid navigation domain.


Reviews: Credit Assignment For Collective Multiagent RL With Global Rewards

Neural Information Processing Systems

The paper tackles a multi-agent credit assignment problem, an egregious problem within multi-agent systems by extending existing methods on difference rewards for settings in which the population of the system is large. Though the results are relevant and lead to an improvement for large population systems, the contribution is nonetheless limited to a modification of existing techniques for a specific setting which seemingly requires the number of agents to be large and for the agents to observe a count of the agents within their neighbourhood. The results of the paper enable improved credit assignment in the presence of noise from other agents' actions, an improved baseline leading to reduced variance and, in turn, better estimates of the collective policy gradient (under homogeneity assumptions). The analysis of the paper applies to a specific setting in which the reward function has a term that is common to all agents and therefore is not decomposable. The extent to which this property is to be found in multi-agent systems, however, is not discussed.


Credit Assignment For Collective Multiagent RL With Global Rewards

Nguyen, Duc Thien, Kumar, Akshat, Lau, Hoong Chuin

Neural Information Processing Systems

Scaling decision theoretic planning to large multiagent systems is challenging due to uncertainty and partial observability in the environment. We focus on a multiagent planning model subclass, relevant to urban settings, where agent interactions are dependent on their collective influence'' on each other, rather than their identities. Unlike previous work, we address a general setting where system reward is not decomposable among agents. We develop collective actor-critic RL approaches for this setting, and address the problem of multiagent credit assignment, and computing low variance policy gradient estimates that result in faster convergence to high quality solutions. We also develop difference rewards based credit assignment methods for the collective setting.